An Alternative to Low-level-Sychrony-Based Methods for Speech Detection

نویسندگان

  • Paul Ruvolo
  • Javier R. Movellan
چکیده

Determining whether someone is talking has applications in many areas such as speech recognition, speaker diarization, social robotics, facial expression recognition, and human computer interaction. One popular approach to this problem is audio-visual synchrony detection [10, 21, 12]. A candidate speaker is deemed to be talking if the visual signal around that speaker correlates with the auditory signal. Here we show that with the proper visual features (in this case movements of various facial muscle groups), a very accurate detector of speech can be created that does not use the audio signal at all. Further we show that this person independent visual-only detector can be used to train very accurate audio-based person dependent voice models. The voice model has the advantage of being able to identify when a particular person is speaking even when they are not visible to the camera (e.g. in the case of a mobile robot). Moreover, we show that a simple sensory fusion scheme between the auditory and visual models improves performance on the task of talking detection. The work here provides dramatic evidence about the efficacy of two very different approaches to multimodal speech detection on a challenging database.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design and evaluation of validity of an electronic alternative and augmentative communication system for Persian-speaking children

Introduction: Due to the high prevalence of communication disorders, augmentative and alternative communication methods are one the options ahead to solve the problems of these people. Since there are no complex tools for Persian-speaking children with communication disorders, we decided to design communication assistant software for these children that produces sound output. Materials and Meth...

متن کامل

Phased array ultrasonic imaging using an improved beamforming based total focusing method for non destructive test

One of the novel ultrasonic phased array based scanning methods for ultrasonic imaging in non-destructive test is total focusing method (TFM). This method employs maximum available information of the phased array elements and leads to an improved defect detection accuracy compared to conventional scanning methods. Despite its high detection accuracy, TFM behaves weak in distinguishing the real ...

متن کامل

Phased array ultrasonic imaging using an improved beamforming based total focusing method for non destructive test

One of the novel ultrasonic phased array based scanning methods for ultrasonic imaging in non-destructive test is total focusing method (TFM). This method employs maximum available information of the phased array elements and leads to an improved defect detection accuracy compared to conventional scanning methods. Despite its high detection accuracy, TFM behaves weak in distinguishing the real ...

متن کامل

A Saliency Detection Model via Fusing Extracted Low-level and High-level Features from an Image

Saliency regions attract more human’s attention than other regions in an image. Low- level and high-level features are utilized in saliency region detection. Low-level features contain primitive information such as color or texture while high-level features usually consider visual systems. Recently, some salient region detection methods have been proposed based on only low-level features or hig...

متن کامل

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010